For this optional assignment we will be recreating this plot from The Economist:
Import the ggplot2 data.table libraries and use fread to load the csv file ‘Economist_Assignment_Data.csv’ into a dataframe called df (Hint: use drop=1 to skip the first column)
library(ggplot2)
library(data.table)
df <- fread('Economist_Assignment_Data.csv',drop=1)
Check the head of df
head(df)
## Country HDI.Rank HDI CPI Region
## 1: Afghanistan 172 0.398 1.5 Asia Pacific
## 2: Albania 70 0.739 3.1 East EU Cemt Asia
## 3: Algeria 96 0.698 2.9 MENA
## 4: Angola 148 0.486 2.0 SSA
## 5: Argentina 45 0.797 3.0 Americas
## 6: Armenia 86 0.716 2.6 East EU Cemt Asia
Use ggplot() + geom_point() to create a scatter plot object called pl. You will need to specify x=CPI and y=HDI and color=Region as aesthetics
pl <- ggplot(df,aes(x=CPI, y=HDI,color=Region))
print(pl + geom_point())
Change the points to be larger empty circles. (You’ll have to go back and add arguments to geom_point() and reassign it to pl.) You’ll need to figure out what shape= and size=
print(pl + geom_point(size=3,shape=1))
Add geom_smooth(aes(group=1)) to add a trend line
print(pl + geom_point(size=3,shape=1) + geom_smooth(aes(group=1)))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Draw the log-transformed graph, remove shades, and change the trend color
pl2 <- pl + geom_point(size=3,shape=1) + geom_smooth(aes(group=1), method = 'lm',formula = y~log(x), se = FALSE, color = 'red')
print(pl2)
It’s really starting to look similar! But we still need to add labels, we can use geom_text! Add geom_text(aes(label=Country)) to pl2 and see what happens. (Hint: It should be way too many labels)
print(pl2 + geom_text(aes(label=Country)))
Labeling a subset is actually pretty tricky! So we’re just going to give you the answer since it would require manually selecting the subset of countries we want to label!
pointsToLabel <- c("Russia", "Venezuela", "Iraq", "Myanmar", "Sudan",
"Afghanistan", "Congo", "Greece", "Argentina", "Brazil",
"India", "Italy", "China", "South Africa", "Spane",
"Botswana", "Cape Verde", "Bhutan", "Rwanda", "France",
"United States", "Germany", "Britain", "Barbados", "Norway", "Japan",
"New Zealand", "Singapore")
pl3 <- pl2 + geom_text(aes(label = Country), color = "gray20", data = subset(df, Country %in% pointsToLabel),check_overlap = TRUE)
print(pl3)
Almost there! Still not perfect, but good enough for this assignment. Later on we’ll see why interactive plots are better for labeling. Now let’s just add some labels and a theme, set the x and y scales and we’re done! Add theme_bw() to your plot and save this to pl4
pl4 <- pl3 + theme_bw()
print(pl4)
Add scale_x_continuous() and set the following arguments: name = Same x axis as the Economist Plot limits = Pass a vector of appropriate x limits breaks = 1:10
pl5 <- pl4 + scale_x_continuous(name='Corruption Perceptions Index, 2011 (10=least corrupt)',limits = c(0.5,10.5),breaks = 1:10)
print(pl5)
Now use scale_y_continuous to do similar operations to the y axis!
pl6 <- pl5 + scale_y_continuous(name = 'Human Development Index, 2011 (1=Best)',limits = c(0.2,1.0), breaks = seq(0.2,1,by=0.2))
print(pl6)
Finally use ggtitle() to add a string as a title.
pl7 <- pl6 + ggtitle("Corruption and Human development")
print(pl7)
library(ggthemes)
pl8 <- pl7 + theme_economist_white()
print(pl8)